Toward a perceptual object recognition system
نویسندگان
چکیده
[1] demonstrated that humans are easily able to recognize an object in less than 0.5 seconds. Unfortunately, object recognition remains one of the most challenging problems in computer vision. Many algorithms based on local approaches have been proposed in recent decades. Local approaches can be divided in 4 phases: region selection, region appearance description, image representation and classification [2]. Although these systems have demonstrated excellent performance, some weaknesses remain. The first limitation is in the region selection phase. Many existing techniques extract a large number of points/regions of interest. For instance, dense grids contain tens of thousands of points per image while interest point detectors often extract thousands of points. Furthermore, some studies have demonstrated that these techniques were not designed to detect the most pertinent regions for object recognition. There is only a weak correlation between the distribution of extracted points and eye fixations [3]. The second limitation mentioned in the literature concerns the region appearance description phase. The techniques used in this phase typically describe image regions using highdimensional vectors [4]. For example, SIFT, the most popular descriptor for object recognition, produces a 128-dimensional vector per region [5]. The main objective of this thesis is to propose a pipeline for an object recognition algorithm based on human perception which addresses the object recognition system complexity: query run time and memory allocation. In this context, we propose a filter based on a visual attention system [6] to address the problems of extracting a large number of points of interest using existing region selection techniques. We chose to use bottom-up visual attention systems that encode attentional fixations in a topographic map, known as a saliency map. This map serves as basis for generating a mask to select salient points according to human interest, from the points extracted by a region selection technique [7]. Furthermore, we addressed the problem of high dimensionality of descriptors in region appearance phase. We proposed a new hybrid descriptor representing the spatial frequency of some perceptual features, extracted by a visual attention system (color, texture, intensity [8]. This descriptor consist of a concatenation of energy measures computed at the output of a filter bank [9], at each level of the multi-resolution pyramid of perceptual features. This descriptor has the advantage of being lower dimensional than traditional descriptors. The test of our filtering approach, using Perreira da Silva system [10] as a filter on VOC2005, demonstrated that we can maintain approximately the same performance of an object recognition system by selecting only Correspondence to: Recommended for acceptance by Jorge Bernal DOI http://dx.doi.org/10.5565/rev/elcvia.714 ELCVIA ISSN:1577-5097 Published by Computer Vision Center / Universitat Autònoma de Barcelona, Barcelona, Spain 12 AWAD et al. / Electronic Letters on Computer Vision and Image Analysis 14(3):11-12, 2015 40% of extracted points (using Harris-Laplace [11] and Laplacian [12]), while having an important reduction in complexity (40% reduction in query run time). Furthermore, evaluating our descriptor with an object recognition system using Harris-Laplace and Laplacian interest point detectors on VOC2007 database showed a slight decrease in performance ( 5% reduction of average precision) compared to the original system based on the SIFT descriptor, but with a 50% reduction in complexity. In addition, we evaluated our descriptor using a visual attention system as the region selection technique on VOC2005. The experiment showed a slight decrease in performance (3% reduction in precision), but a drastically reduced complexity of the system (with 5% reduction in query run-time and 70% in complexity). In this thesis, we proposed two approaches to manage the problems of complexity in object recognition system. In future, it would be interesting to address the problems of the last two phases in object system: image representation and classification, by introducing perceptually plausible concepts such as deep learning techniques.
منابع مشابه
Perceptual Organization of 3d Surface Points
Perceptual organization is proposed as a promising intermediate process toward object recognition and reconstruction from 3D surface points, which can be derived from aerial stereo-images, LIDAR data or InSAR data. Here, perceptual organization is to group sensory primitives originating from the same object and has been emphasized as a robust intermediate-level grouping process toward object re...
متن کاملسازمان ادراکی و انسجام مرکزی حین پردازشهای دیداری در کودکان اُتیسم: شواهدی برای از هم گسیختگی ارتباطات کارکردی در مغز اُتیستیک
Objective: A variety of evidence demonstrate altered perceptual functioning during visual processing in the brain of children with autism.it possibly is related to or the cause other diagnostic symptom in autism spectrum. In the present study visual perceptual organization in autistic children is studied. These processes require central coherence and typical functional connectivity among neural...
متن کاملUrban Vegetation Recognition Based on the Decision Level Fusion of Hyperspectral and Lidar Data
Introduction: Information about vegetation cover and their health has always been interesting to ecologists due to its importance in terms of habitat, energy production and other important characteristics of plants on the earth planet. Nowadays, developments in remote sensing technologies caused more remotely sensed data accessible to researchers. The combination of these data improves the obje...
متن کاملNear set Evaluation And Recognition (NEAR) System V2.0
This report presents the Near set Evaluation And Recognition (NEAR) system. The goal of the NEAR system is to extract perceptual information from images using near set theory, which provides a framework for measuring the perceptual nearness of objects. The contributions of this report are an introduction to the NEAR system as an application of near set theory to image processing, a featurebased...
متن کاملAppearance-Based Recognition Using Perceptual Components
A fundamental problem with appearance-based recognition is how to encode the perceptual similarity between images as images need to be grouped based on their perceptual similarity. In this paper, we employ a spectral histogram model for generic appearance-based recognition. A perceptual component is defined as the spectral histogram of a training image, which encodes all the images perceptually...
متن کاملInteraction between perceptual and cognitive processing well acknowledged in perceptual expertise research
To understand the neural correlates of expert object recognition, Harel et al. (2013) proposed the use of an existing theoretical framework (Mahon et al., 2007; Martin, 2007) that emphasizes the interaction between different parts of the visual pathway as well as between the visual and other cognitive systems. While we agree that focusing more on the role of these interactions in expertise acqu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015